Measuring Transitivity Using Untrained Annotators

نویسندگان

  • Nitin Madnani
  • Jordan L. Boyd-Graber
  • Philip Resnik
چکیده

Hopper and Thompson (1980) defined a multiaxis theory of transitivity that goes beyond simple syntactic transitivity and captures how much “action” takes place in a sentence. Detecting these features requires a deep understanding of lexical semantics and real-world pragmatics. We propose two general approaches for creating a corpus of sentences labeled with respect to the Hopper-Thompson transitivity schema using Amazon Mechanical Turk. Both approaches assume no existing resources and incorporate all necessary annotation into a single system; this is done to allow for future generalization to other languages. The first task attempts to use language-neutral videos to elicit human-composed sentences with specified transitivity attributes. The second task uses an iterative process to first label the actors and objects in sentences and then annotate the sentences’ transitivity. We examine the success of these techniques and perform a preliminary classification of the transitivity of held-out data. Hopper and Thompson (1980) created a multi-axis theory of Transitivity1 that describes the volition of the subject, the affectedness of the object, and the duration of the action. In short, this theory goes beyond the simple grammatical notion of transitivity (whether verbs take objects — transitive — or not — intransitive) and captures how much “action” takes place in a sentence. Such notions of Transitivity are not apparent from surface features alone; identical syntactic constructions can have vastly different Transitivity. This well-established linguistic theory, however, is not useful for real-world applications without a Transitivity-annotated corpus. Given such a substantive corpus, conventional machine learning techniques could help determine the Transitivity of verbs within sentences. Transitivity has been found to play a role in what is called “syntactic framing,” which expresses implicit sentiment (Greene and Resnik, 2009). In these contexts, the perspective or sentiment of the writer is reflected in the constructions used to express ideas. For example, a less Transitive construction We use capital “T” to differentiate from conventional syntactic transitivity throughout the paper. might be used to deflect responsibility (e.g. “John was killed” vs. “Benjamin killed John”). In the rest of this paper, we review the HopperThompson transitivity schema and propose two relatively language-neutral methods to collect Transitivity ratings. The first asks humans to generate sentences with desired Transitivity characteristics. The second asks humans to rate sentences on dimensions from the Hopper-Thompson schema. We then discuss the difficulties of collecting such linguistically deep data and analyze the available results. We then pilot an initial classifier on the Hopper-Thompson dimensions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anveshan: A Framework for Analysis of Multiple Annotators' Labeling Behavior

Manual annotation of natural language to capture linguistic information is essential for NLP tasks involving supervised machine learning of semantic knowledge. Judgements of meaning can be more or less subjective, in which case instead of a single correct label, the labels assigned might vary among annotators based on the annotators’ knowledge, age, gender, intuitions, background, and so on. We...

متن کامل

Predicting Human-Targeted Translation Edit Rate via Untrained Human Annotators

In the field of machine translation, automatic metrics have proven quite valuable in system development for tracking progress and measuring the impact of incremental changes. However, human judgment still plays a large role in the context of evaluating MT systems. For example, the GALE project uses humantargeted translation edit rate (HTER), wherein the MT output is scored against a post-edited...

متن کامل

Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels

Word sense disambiguation aims to identify which meaning of a word is present in a given usage. Gathering word sense annotations is a laborious and difficult task. Several methods have been proposed to gather sense annotations using large numbers of untrained annotators, with mixed results. We propose three new annotation methodologies for gathering word senses where untrained annotators are al...

متن کامل

Contextual Sentiment Analysis with Untrained Annotators

This work presents a proposal to perform contextual sentiment analysis using a supervised learning algorithm and disregarding the extensive training of annotators. To achieve this goal, a web platform was developed to perform the entire procedure outlined in this paper. The main contribution of the pipeline described in this article is to simplify and automate the annotation process through a s...

متن کامل

Multiplicity and word sense: evaluating and learning from multiply labeled word sense annotations

Supervised machine learning methods to model word sense often rely on human labelers to provide a single, ground truth sense label for each word in its context. The finegrained, sense label inventories preferred by lexicographers have been argued to lead to lower annotation reliability in measures of agreement among two or three human labelers (annotators). We hypothesize that annotators can ag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010